In this project, I will be focusing on exploring the key insights of Seattle Airbnb market from the perspectives of interactive data visualization and text mining.
The Seattle Airbnb dataset contains files about Airbnb listings in Seattle, calendar availability for each of these listings, user reviews on the listings as well as the geometry information of each neighbourhood. Using this dataset, I attempt to answer the following business questions from three aspects:
Location impact on Seattle Airbnb market
Advice for tourists
Insights for hosts
import pandas as pd
pd.set_option("max_columns", None)
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
# for interactive maps
import folium
from folium.plugins import FastMarkerCluster
import geopandas as gpd
import branca
# for interactive plotly graphs
import plotly.graph_objs as go
# for sentiment analysis
import re
import nltk
nltk.download('stopwords')
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem.wordnet import WordNetLemmatizer
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
from sklearn.feature_extraction.text import TfidfVectorizer
from wordcloud import WordCloud
from langdetect import detect
import warnings
warnings.filterwarnings("ignore")
[nltk_data] Downloading package stopwords to [nltk_data] /Users/robustor/nltk_data... [nltk_data] Package stopwords is already up-to-date!
The Seattle Airbnb dataset used here is obtained from insideairbnb.com and compiled on October 25, 2020. Let's first get an overview of the dataset.
LISTINGS data
# read in the detailed listings data for Seattle
listings_df = pd.read_csv("./listings.csv")
listings_df.head()
| id | listing_url | scrape_id | last_scraped | name | description | neighborhood_overview | picture_url | host_id | host_url | host_name | host_since | host_location | host_about | host_response_time | host_response_rate | host_acceptance_rate | host_is_superhost | host_thumbnail_url | host_picture_url | host_neighbourhood | host_listings_count | host_total_listings_count | host_verifications | host_has_profile_pic | host_identity_verified | neighbourhood | neighbourhood_cleansed | neighbourhood_group_cleansed | latitude | longitude | property_type | room_type | accommodates | bathrooms | bathrooms_text | bedrooms | beds | amenities | price | minimum_nights | maximum_nights | minimum_minimum_nights | maximum_minimum_nights | minimum_maximum_nights | maximum_maximum_nights | minimum_nights_avg_ntm | maximum_nights_avg_ntm | calendar_updated | has_availability | availability_30 | availability_60 | availability_90 | availability_365 | calendar_last_scraped | number_of_reviews | number_of_reviews_ltm | number_of_reviews_l30d | first_review | last_review | review_scores_rating | review_scores_accuracy | review_scores_cleanliness | review_scores_checkin | review_scores_communication | review_scores_location | review_scores_value | license | instant_bookable | calculated_host_listings_count | calculated_host_listings_count_entire_homes | calculated_host_listings_count_private_rooms | calculated_host_listings_count_shared_rooms | reviews_per_month | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2318 | https://www.airbnb.com/rooms/2318 | 20201025051148 | 2020-10-25 | Casa Madrona - Urban Oasis 1 block from the park! | Gorgeous, architect remodeled, Dutch Colonial ... | Madrona is a hidden gem of a neighborhood. It ... | https://a0.muscache.com/pictures/02973ad3-a7a3... | 2536 | https://www.airbnb.com/users/show/2536 | Megan | 2008-08-26 | Seattle, Washington, United States | I welcome guests from all walks of life and ev... | within a day | 100% | 78% | t | https://a0.muscache.com/im/pictures/user/016a1... | https://a0.muscache.com/im/pictures/user/016a1... | Minor | 2.0 | 2.0 | ['email', 'phone', 'reviews', 'jumio', 'offlin... | t | t | Seattle, Washington, United States | Madrona | Central Area | 47.61082 | -122.29082 | Entire house | Entire home/apt | 9 | NaN | 2.5 baths | 4.0 | 4.0 | ["Children\u2019s books and toys", "Iron", "Ha... | $295.00 | 1 | 1125 | 1 | 2 | 1125 | 1125 | 1.0 | 1125.0 | NaN | t | 0 | 0 | 26 | 26 | 2020-10-25 | 32 | 4 | 0 | 2008-09-15 | 2020-02-01 | 100.0 | 10.0 | 10.0 | 10.0 | 10.0 | 10.0 | 10.0 | STR-OPLI-19-002837 | f | 2 | 2 | 0 | 0 | 0.22 |
| 1 | 9419 | https://www.airbnb.com/rooms/9419 | 20201025051148 | 2020-10-25 | Glorious sun room w/ memory foambed | Keeping you safe is our priority, we are adher... | Lots of restaurants (see our guide book) bars,... | https://a0.muscache.com/pictures/56645186/e5fb... | 30559 | https://www.airbnb.com/users/show/30559 | Angielena | 2009-08-09 | Seattle, Washington, United States | I am a visual artist who is the director of ... | within a few hours | 100% | 89% | t | https://a0.muscache.com/im/users/30559/profile... | https://a0.muscache.com/im/users/30559/profile... | Georgetown | 8.0 | 8.0 | ['email', 'phone', 'reviews', 'jumio', 'offlin... | t | t | Seattle, Washington, United States | Georgetown | Other neighborhoods | 47.55017 | -122.31937 | Private room in apartment | Private room | 2 | NaN | 3 shared baths | 1.0 | 2.0 | ["Iron", "Hangers", "Lock on bedroom door", "H... | $55.00 | 2 | 180 | 2 | 2 | 180 | 180 | 2.0 | 180.0 | NaN | t | 29 | 59 | 89 | 364 | 2020-10-25 | 148 | 2 | 0 | 2010-07-30 | 2019-12-27 | 93.0 | 10.0 | 10.0 | 10.0 | 10.0 | 10.0 | 10.0 | str-opli-19-003039 | f | 8 | 0 | 8 | 0 | 1.19 |
| 2 | 9531 | https://www.airbnb.com/rooms/9531 | 20201025051148 | 2020-10-25 | The Adorable Sweet Orange Craftsman | The Sweet Orange is a delightful and spacious ... | The neighborhood is awesome! Just far enough ... | https://a0.muscache.com/pictures/30470355/052c... | 31481 | https://www.airbnb.com/users/show/31481 | Cassie | 2009-08-13 | Seattle, Washington, United States | The Sweet Orange reflects my passion and zest ... | within a day | 100% | 64% | t | https://a0.muscache.com/im/users/31481/profile... | https://a0.muscache.com/im/users/31481/profile... | The Junction | 2.0 | 2.0 | ['email', 'phone', 'reviews', 'offline_governm... | t | t | Seattle, Washington, United States | Fairmount Park | West Seattle | 47.55539 | -122.38474 | Entire house | Entire home/apt | 4 | NaN | 1 bath | 2.0 | 3.0 | ["Iron", "TV", "Hangers", "Cable TV", "Private... | $155.00 | 28 | 1125 | 28 | 28 | 1125 | 1125 | 28.0 | 1125.0 | NaN | t | 0 | 0 | 19 | 294 | 2020-10-25 | 40 | 1 | 0 | 2012-01-12 | 2019-12-30 | 100.0 | 10.0 | 10.0 | 10.0 | 10.0 | 10.0 | 10.0 | STR-OPLI-19-002182 | f | 2 | 2 | 0 | 0 | 0.37 |
| 3 | 9534 | https://www.airbnb.com/rooms/9534 | 20201025051148 | 2020-10-25 | The Coolest Tangerine Dream MIL! | Welcome to my delicious Tangerine Dream! A co... | The neighborhood is the best of two worlds...w... | https://a0.muscache.com/pictures/30476721/0751... | 31481 | https://www.airbnb.com/users/show/31481 | Cassie | 2009-08-13 | Seattle, Washington, United States | The Sweet Orange reflects my passion and zest ... | within a day | 100% | 64% | t | https://a0.muscache.com/im/users/31481/profile... | https://a0.muscache.com/im/users/31481/profile... | The Junction | 2.0 | 2.0 | ['email', 'phone', 'reviews', 'offline_governm... | t | t | Seattle, Washington, United States | Fairmount Park | West Seattle | 47.55624 | -122.38598 | Entire guest suite | Entire home/apt | 3 | NaN | 1 bath | 2.0 | 2.0 | ["Conditioner", "Iron", "TV", "Hangers", "Cabl... | $125.00 | 5 | 1125 | 5 | 5 | 1125 | 1125 | 5.0 | 1125.0 | NaN | t | 10 | 14 | 40 | 315 | 2020-10-25 | 53 | 8 | 0 | 2012-01-15 | 2020-08-31 | 100.0 | 10.0 | 10.0 | 10.0 | 10.0 | 10.0 | 10.0 | STR-OPLI-19-002182 | f | 2 | 2 | 0 | 0 | 0.50 |
| 4 | 9596 | https://www.airbnb.com/rooms/9596 | 20201025051148 | 2020-10-25 | the down home , spacious, central and fab! | We are in a great neighborhood, quiet, full of... | if you arrive early for check in at 3, I reco... | https://a0.muscache.com/pictures/665252/102d18... | 14942 | https://www.airbnb.com/users/show/14942 | Joyce | 2009-04-26 | Seattle, Washington, United States | I am a therapist/innkeeper.I know my city well... | within a few hours | 90% | 94% | f | https://a0.muscache.com/im/users/14942/profile... | https://a0.muscache.com/im/users/14942/profile... | Wallingford | 5.0 | 5.0 | ['email', 'phone', 'facebook', 'reviews', 'kba'] | t | t | Seattle, Washington, United States | Wallingford | Other neighborhoods | 47.65479 | -122.33652 | Entire apartment | Entire home/apt | 4 | NaN | 1 bath | 1.0 | 4.0 | ["Iron", "TV", "Hangers", "Cable TV", "Hair dr... | $100.00 | 4 | 60 | 4 | 4 | 1125 | 1125 | 4.0 | 1125.0 | NaN | t | 0 | 0 | 6 | 6 | 2020-10-25 | 97 | 4 | 1 | 2011-06-15 | 2020-09-28 | 91.0 | 9.0 | 9.0 | 10.0 | 9.0 | 10.0 | 9.0 | STR-OPLI-19-002622 | f | 2 | 2 | 0 | 0 | 0.85 |
listings_df.info(verbose=True, null_counts=True)
<class 'pandas.core.frame.DataFrame'> RangeIndex: 4335 entries, 0 to 4334 Data columns (total 74 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 id 4335 non-null int64 1 listing_url 4335 non-null object 2 scrape_id 4335 non-null int64 3 last_scraped 4335 non-null object 4 name 4335 non-null object 5 description 4323 non-null object 6 neighborhood_overview 3045 non-null object 7 picture_url 4335 non-null object 8 host_id 4335 non-null int64 9 host_url 4335 non-null object 10 host_name 4327 non-null object 11 host_since 4327 non-null object 12 host_location 4321 non-null object 13 host_about 3104 non-null object 14 host_response_time 3761 non-null object 15 host_response_rate 3761 non-null object 16 host_acceptance_rate 3960 non-null object 17 host_is_superhost 4327 non-null object 18 host_thumbnail_url 4327 non-null object 19 host_picture_url 4327 non-null object 20 host_neighbourhood 3919 non-null object 21 host_listings_count 4327 non-null float64 22 host_total_listings_count 4327 non-null float64 23 host_verifications 4335 non-null object 24 host_has_profile_pic 4327 non-null object 25 host_identity_verified 4327 non-null object 26 neighbourhood 3045 non-null object 27 neighbourhood_cleansed 4335 non-null object 28 neighbourhood_group_cleansed 4335 non-null object 29 latitude 4335 non-null float64 30 longitude 4335 non-null float64 31 property_type 4335 non-null object 32 room_type 4335 non-null object 33 accommodates 4335 non-null int64 34 bathrooms 0 non-null float64 35 bathrooms_text 4332 non-null object 36 bedrooms 3740 non-null float64 37 beds 4301 non-null float64 38 amenities 4335 non-null object 39 price 4335 non-null object 40 minimum_nights 4335 non-null int64 41 maximum_nights 4335 non-null int64 42 minimum_minimum_nights 4335 non-null int64 43 maximum_minimum_nights 4335 non-null int64 44 minimum_maximum_nights 4335 non-null int64 45 maximum_maximum_nights 4335 non-null int64 46 minimum_nights_avg_ntm 4335 non-null float64 47 maximum_nights_avg_ntm 4335 non-null float64 48 calendar_updated 0 non-null float64 49 has_availability 4335 non-null object 50 availability_30 4335 non-null int64 51 availability_60 4335 non-null int64 52 availability_90 4335 non-null int64 53 availability_365 4335 non-null int64 54 calendar_last_scraped 4335 non-null object 55 number_of_reviews 4335 non-null int64 56 number_of_reviews_ltm 4335 non-null int64 57 number_of_reviews_l30d 4335 non-null int64 58 first_review 3508 non-null object 59 last_review 3508 non-null object 60 review_scores_rating 3494 non-null float64 61 review_scores_accuracy 3462 non-null float64 62 review_scores_cleanliness 3462 non-null float64 63 review_scores_checkin 3462 non-null float64 64 review_scores_communication 3462 non-null float64 65 review_scores_location 3462 non-null float64 66 review_scores_value 3462 non-null float64 67 license 3030 non-null object 68 instant_bookable 4335 non-null object 69 calculated_host_listings_count 4335 non-null int64 70 calculated_host_listings_count_entire_homes 4335 non-null int64 71 calculated_host_listings_count_private_rooms 4335 non-null int64 72 calculated_host_listings_count_shared_rooms 4335 non-null int64 73 reviews_per_month 3508 non-null float64 dtypes: float64(18), int64(21), object(35) memory usage: 2.4+ MB
Overall, there are 4335 listings in Seattle on October 25, 2020. The columns I'll be using below include id, neighbourhood_cleansed, latitude, longitude, accommodates and price. We notice that none of the columns has null values.
CALENDAR data
# read in the calendar data for Seattle
calendar_df = pd.read_csv("./calendar.csv")
calendar_df.head()
| listing_id | date | available | price | adjusted_price | minimum_nights | maximum_nights | |
|---|---|---|---|---|---|---|---|
| 0 | 22153582 | 2020-10-25 | t | $144.00 | $144.00 | 3 | 1125 |
| 1 | 22153582 | 2020-10-26 | t | $141.00 | $141.00 | 3 | 1125 |
| 2 | 22153582 | 2020-10-27 | t | $149.00 | $149.00 | 3 | 1125 |
| 3 | 22153582 | 2020-10-28 | t | $96.00 | $96.00 | 3 | 1125 |
| 4 | 22153582 | 2020-10-29 | t | $102.00 | $102.00 | 3 | 1125 |
calendar_df.info(verbose=True, null_counts=True)
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1582275 entries, 0 to 1582274 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 listing_id 1582275 non-null int64 1 date 1582275 non-null object 2 available 1582275 non-null object 3 price 1582275 non-null object 4 adjusted_price 1582275 non-null object 5 minimum_nights 1582275 non-null int64 6 maximum_nights 1582275 non-null int64 dtypes: int64(3), object(4) memory usage: 84.5+ MB
The calendar file has 365 records for each listing, i.e., the price and availablity by date for each listing is specified 365 days ahead.
REVIEWS data
# read in the detailed reviews data for Seattle
reviews_df = pd.read_csv("./reviews.csv")
reviews_df.head()
| listing_id | id | date | reviewer_id | reviewer_name | comments | |
|---|---|---|---|---|---|---|
| 0 | 2318 | 146 | 2008-09-15 | 2451 | Kevin | 1000 times better than staying at a hotel. |
| 1 | 2318 | 126302712 | 2017-01-10 | 12332845 | Jessica | Our family (two couples, a two year old and an... |
| 2 | 2318 | 140977084 | 2017-04-01 | 4789466 | Ivan | Top of the list locations we have stayed at! T... |
| 3 | 2318 | 147262504 | 2017-04-25 | 55817131 | Mike | SUCH an awesome place. Very clean, quiet and s... |
| 4 | 2318 | 161806368 | 2017-06-18 | 113604590 | Pete | We flew quite a distance to be at our only dau... |
reviews_df.info(verbose=True, null_counts=True)
<class 'pandas.core.frame.DataFrame'> RangeIndex: 270875 entries, 0 to 270874 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 listing_id 270875 non-null int64 1 id 270875 non-null int64 2 date 270875 non-null object 3 reviewer_id 270875 non-null int64 4 reviewer_name 270875 non-null object 5 comments 270752 non-null object dtypes: int64(3), object(3) memory usage: 12.4+ MB
The reviews file holds reviews for each listing. We can see some comments are missing. We will only use comments avalable for the sentiment analysis.
NEIGHBOURHOODS data
# load the neighbourhood information
nbh_geo = gpd.read_file('neighbourhoods.geojson', driver='GeoJSON')
nbh_geo.head()
| neighbourhood | neighbourhood_group | geometry | |
|---|---|---|---|
| 0 | Wallingford | Other neighborhoods | MULTIPOLYGON (((-122.34731 47.66501, -122.3464... |
| 1 | West Queen Anne | Queen Anne | MULTIPOLYGON (((-122.35692 47.63959, -122.3569... |
| 2 | Adams | Ballard | MULTIPOLYGON (((-122.37634 47.67592, -122.3762... |
| 3 | West Woodland | Ballard | MULTIPOLYGON (((-122.37634 47.67592, -122.3760... |
| 4 | East Queen Anne | Queen Anne | MULTIPOLYGON (((-122.35692 47.63959, -122.3569... |
nbh_geo.info(verbose=True, null_counts=True)
<class 'geopandas.geodataframe.GeoDataFrame'> RangeIndex: 91 entries, 0 to 90 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 neighbourhood 91 non-null object 1 neighbourhood_group 91 non-null object 2 geometry 91 non-null geometry dtypes: geometry(1), object(2) memory usage: 2.3+ KB
The neighbourhoods file covers the geometry information of 91 neighbourhoods in Seattle. Just as the calendar file, the information present in the neighbourhoods file is also complete.
Our data preparation step becomes easy since either the information we need is complete or there is no need to impute missing values. The only data wrangling we need to perform is to convert the type of the price columns from both the listings and calendar files to the float type.
# add a 'price_clean' column representing converted price values to the listings and calendar data, respectively
listings_df['price_clean'] = listings_df['price'].str.replace('$','').str.replace(',','').astype(float)
calendar_df['price_clean'] = calendar_df['price'].str.replace('$','').str.replace(',','').astype(float)
Before we move to the step of data modeling, we conduct a brief explorary data analysis to gain a sense of the primary locations, room types and accommodation of the listings.
# show the locations of listings in an interactive map
lats = listings_df['latitude'].tolist()
lons = listings_df['longitude'].tolist()
locations = list(zip(lats, lons))
seattle_coordinates = [47.6097, -122.3331]
loc_map = folium.Map(location=seattle_coordinates, zoom_start=11)
FastMarkerCluster(data=locations).add_to(loc_map)
loc_map.save('listing_locations.html')
loc_map
As expected, most listings are located in the center area of the city. This map is interactive, and we can zoom-in on the clusters to eventually find the individual locations of the listings.
# plot the distribution of room type in a pie chart
plt.figure(figsize=(5, 5))
my_circle = plt.Circle((0, 0), 0.7, color='white')
d = plt.pie(listings_df.groupby(['room_type'])['id'].nunique().reset_index().id,
labels=listings_df['room_type'].unique(),
autopct='%1.1f%%',
startangle=90,
labeldistance=1.1)
plt.axis('equal')
plt.gca().add_artist(my_circle)
plt.savefig('room_type_dist.png', bbox_inches='tight', dpi=150)
In Seattle, a majority of Airbnb listings is entire home/apartment. The listings for private room and hotel room are very rare.
# plot the distribution of the apartment size based on the number of people that can be accommodated
feq = listings_df['accommodates'].value_counts().sort_index()
feq.plot.bar(figsize=(6, 4), width=1, rot=0)
plt.ylabel('Number of listings')
plt.xlabel('Accommodates')
plt.savefig('accommodate_dist.png', bbox_inches='tight', dpi=150)
It can be seen that most listings are for 2 people.
# calculate the number of listings by neighbourhood
nbh_count = listings_df.groupby('neighbourhood_cleansed')['id'].nunique().reset_index()
nbh_count.rename(columns={'neighbourhood_cleansed':'neighbourhood'}, inplace=True)
nbh_geo_count = pd.merge(nbh_geo, nbh_count, on='neighbourhood', how='left')
nbh_geo_count['id'] = nbh_geo_count['id'].fillna(0).astype(int)
# calculate the percentage of listings by neighbourhood
nbh_geo_count['pct'] = nbh_geo_count['id'] / nbh_geo_count['id'].sum()
nbh_geo_count['pct_str'] = nbh_geo_count['pct'].apply(lambda x : str(round(x*100, 1)) + '%')
# create a colorbar
nbh_count_colormap = branca.colormap.linear.YlGnBu_09.scale(min(nbh_geo_count['id']), max(nbh_geo_count['id']))
# plot the number of listings by neighbourhood on an interactive folium map
nbh_locs_map = folium.Map(location=seattle_coordinates, zoom_start=11, tiles='cartodbpositron')
style_function = lambda x: {
'fillColor': nbh_count_colormap(x['properties']['id']),
'color': 'white',
'weight': 1,
'fillOpacity': 0.7
}
folium.GeoJson(
nbh_geo_count,
style_function=style_function,
tooltip=folium.GeoJsonTooltip(
fields=['neighbourhood', 'id', 'pct_str'],
aliases=['Neighbourhood', 'Listings', 'Percentage'],
localize=True
)
).add_to(nbh_locs_map)
# add the colorbar to the map
nbh_count_colormap.add_to(nbh_locs_map)
nbh_count_colormap.caption = 'Number of listings by neighbourhood'
nbh_locs_map.save('neighbourhood_listings.html')
nbh_locs_map
The spatial distribution of listings shows listings are concentrated in two areas. One is Belltown-Center Business District-Broadway neighbourhoods, which represents the downtown area. The other one is the Wallingford-University District area, which includes the campus of University of Washington (UW). Both downtown area and UW campus are attractive choices for tourists to visit.
Average daily price by neighbourhood
To compare average daily price by neighbourhood, we only select the neighbourhoods including at least 5 listings with the most common type of accommodation, which is accommodation for 2 people.
# calculate the average daily price by neighbourhood
nbh_price = listings_df[listings_df['accommodates']==2].groupby('neighbourhood_cleansed') \
.filter(lambda x: len(x) >= 5).groupby('neighbourhood_cleansed')['price_clean'].mean().reset_index()
nbh_price.rename(columns={'neighbourhood_cleansed':'neighbourhood'}, inplace=True)
nbh_geo_price = pd.merge(nbh_geo, nbh_price, on='neighbourhood')
# create a colorbar
nbh_price_colormap = branca.colormap.linear.YlOrRd_09.scale(min(nbh_price['price_clean']), max(nbh_price['price_clean']))
# plot the average daily price by neighbourhood on an interactive folium map
nbh_locs_price_map = folium.Map(location=seattle_coordinates, zoom_start=11, tiles='cartodbpositron')
style_function = lambda x: {
'fillColor': nbh_price_colormap(x['properties']['price_clean']),
'color': 'white',
'weight': 1,
'fillOpacity': 0.7
}
folium.GeoJson(
nbh_geo_price,
style_function=style_function,
tooltip=folium.GeoJsonTooltip(
fields=['neighbourhood', 'price_clean'],
aliases=['Neighbourhood', 'Average price'],
localize=True
)
).add_to(nbh_locs_price_map)
# add the colorbar to the map
nbh_price_colormap.add_to(nbh_locs_price_map)
nbh_price_colormap.caption = 'Average daily price by neighbourhood'
nbh_locs_price_map.save('neighbourhood_prices.html')
nbh_locs_price_map
It can be seen that the costliest neighbourhoods are also in the downtown area due to high demand, while the average rental prices of UW campus area are far cheaper. Other relatively expensive places, such as West Woodland and North Beach/Blue Ridge are waterfront neighbourhoods.
Availability over time
# calculate the sum of available listings by date
sum_available = calendar_df[calendar_df['available'] == 't'] \
.groupby(['date']).size().to_frame(name='available').reset_index()
# convert 'date' to 'weekday'
sum_available['date'] = pd.to_datetime(sum_available['date'])
sum_available['weekday'] = sum_available['date'].dt.day_name()
# plot the sum of available listings by date
fig = go.Figure(data=go.Scatter(x=sum_available['date'],
y=sum_available['available'],
text=sum_available['weekday']))
# set the layout
fig.update_layout(
autosize=False,
width=480,
height=360,
margin=dict(l=0, r=0, t=30, b=0),
xaxis_title = 'Date',
yaxis_title = 'Number of listings available'
)
fig.write_html('availability_over_time.html')
fig.show()
It shows that there are generally more accomodations available up to three months ahead than further into next year. Part of the reason might be that hosts are more actively updating their calendars in this timeframe. Besides, due to Seattle's rainy winter, most of people prefer to visit Seattle in summer or autumn instead.
Average price by date
listings_df.rename(columns={'id':'listing_id'}, inplace=True)
calendar_df = pd.merge(calendar_df, listings_df[['listing_id','accommodates']], on='listing_id', how='left')
average_price = calendar_df[(calendar_df['available'] == 't') & (calendar_df['accommodates'] == 2)] \
.groupby(['date'])['price_clean'].mean().reset_index()
average_price['date'] = pd.to_datetime(average_price['date'])
average_price['weekday'] = average_price['date'].dt.day_name()
fig = go.Figure(data=go.Scatter(x=average_price['date'],
y=average_price['price_clean'],
text=average_price['weekday']))
fig.update_layout(
autosize=False,
width=480,
height=360,
margin=dict(l=0, r=0, t=30, b=0),
xaxis_title = 'Date',
yaxis_title = 'Average price of 2p accommodation'
)
fig.write_html('average_2p_price_over_time.html')
fig.show()
We find that the peak of average daily price for a 2-person place occurs on September 4 next year at about $132, and the cyclical pattern is due to higher prices in weekends.
Last, we want to find out which housing properties (e.g., proximity of restaurants, shops, hygiene, safety, etc.) lead to a good rental experience, and explore some of the worst reviews. Here, we adopt a python package VADER which is a lexicon and rule-based sentiment analysis tool, the compute the polarity score of the comments.
The advantages of VADER over traditional methods of sentiment analysis include:
def get_sentiment(text):
'''Get the compound polarity score of a text
Args:
text: (str) the text of a comment
Returns:
polarity_score: (float) the compound polarity score of the text
'''
polarity_score = SentimentIntensityAnalyzer().polarity_scores(str(text))['compound']
return polarity_score
reviews_df['polarity_score'] = reviews_df['comments'].apply(get_sentiment)
reviews_df[['comments', 'polarity_score']].head()
| comments | polarity_score | |
|---|---|---|
| 0 | 1000 times better than staying at a hotel. | 0.4404 |
| 1 | Our family (two couples, a two year old and an... | 0.9976 |
| 2 | Top of the list locations we have stayed at! T... | 0.9796 |
| 3 | SUCH an awesome place. Very clean, quiet and s... | 0.9059 |
| 4 | We flew quite a distance to be at our only dau... | 0.9184 |
# print number of positive, negative and neutral compound polarity scores
print('Positive compound scores: {}'.format(reviews_df[reviews_df['polarity_score'] > 0]['polarity_score'].count()))
print('Negative compound scores: {}'.format(reviews_df[reviews_df['polarity_score'] < 0]['polarity_score'].count()))
print('Neutral compound scores: {}'.format(reviews_df[reviews_df['polarity_score'] == 0]['polarity_score'].count()))
Positive compound scores: 261464 Negative compound scores: 3013 Neutral compound scores: 6398
We can see that an overwhelming proportion of reviews are positive. Now let's take a closer look at a few examples of the best and worst reviews.
# explore the best reviews
best_comments = reviews_df[['comments', 'polarity_score']].sort_values(by='polarity_score').tail(5)
for index, row in best_comments.iterrows():
print("Review (score: {}): {} \n".format(row['polarity_score'], row['comments']))
Review (score: 0.9995): This was our third trip to Seattle to visit our son and our first experience with Airbnb. We definitely hit the jackpot with finding Carter’s house. It is beautiful, comfortable, wonderfully decorated, private, and has a “million dollar view.” The view looks like a painting from every angle. The ferry moving silently in and out of the dock confirms that it’s not a painting but real…even well into the night when you watch the lights of the ferries criss-cross across the Sound in silence. We were lucky to have nice weather for our six days and enjoyed seeing the sunset every evening. It was uncharacteristically warm for Seattle and Carter’s house is one of the few that has air-conditioning. The house has everything you could want or need. The well-equipped kitchen is a delight to prepare meals in. The master bedroom and bathroom are exquisite. The rain shower is every bit as wonderful as is described. The two sitting rooms are gorgeous and a pleasure to relax in. My teen son enjoyed the view he had from the Murphy bed that was his for the trip. There is a great sound system and tons of CDs to choose from. The neighborhood is nice and quiet and the location was perfect for everything we did. My Seattle-based son enjoyed coming over with his girlfriend after work. We got to enjoy wonderful quality time preparing meals together and dining and laughing on the wonderful deck. Carter was a great host. He made sure we had everything we needed to enjoy the house. He was great to communicate with and always a quick text away. He even made sure to let us know that we could harvest figs from the tree in his backyard. That was a delicious treat and we were there right at the perfect time! This was definitely the best visit we had with our son and so much more enjoyable than staying in a hotel. Thank you, Carter, for hosting us in your beautiful home. And thank you, Airbnb, for letting us find Carter! We will be back! Review (score: 0.9995): We flew into Seattle quite early to get our tourist on as fast as we could unfortunately, our check-in time did not correspond to our itinerary but Minh-Chau was extremely flexible and let us store our luggage in a locked unit in front of the rental on their property. While we were out at dinner Minh-Chau texted and asked if we found the place OK and how things were going. We let her know that we found the place no problems and that due to her great directions we were able to get there no problem, store our stuff and move on with our day. When we arrived to our room later on that afternoon the place was spotless and she had left the window slightly open to let in all that good Pacific Northwest fresh air in. There were chocolates and a nice welcome note left on the bed for us. As promised there were two water bottles chilled for our consumption and fluffy fresh towels available. The room was really plenty comfortable and cozy for two people. Great bed to sleep in, warm shower available, clean space and a nice sized TV for consumption if you want. What more could one ask for? The place was located in a quiet and peaceful neighborhood and very close to public transportation. There was construction happening in the neighborhood (the whole city is a construction maze FYI) but it surprisingly not disruptive to our stay. Initially, we were a bit confused about the bus routes so we used Uber (which is surprisingly super affordable there). After a little assimilation we were able to navigate quite easily. We even took public transportation all the way out to Discovery Park for a nice hike! The best part was Minh-Chau knew we were celebrating a birthday and had a long car journey the next day. After a little meet and greet one afternoon with herself, her lovely husband and adorable son she came back down with fresh baked scones!! What the what?! They were AMAZING and it was just SO thoughtful! She made enough for dessert that night AND breakfast the next morning. She and her family are so so gracious and a little part of me wishes we could've connected face to face earlier in our trip so we could have taken advantage of her vast knowledge and recommendations of the area. Thanks Minh-Chau and family for welcoming us into your home it was definitely such a comforting place to be away from home. We certainly hope to stay there again next time we're visiting Seattle! Review (score: 0.9995): WOW. Just WOW. My 17 year-old daughter and I stayed here for 3 nights while college-hunting in the greater Seattle area. Jennifer's place lives up to every single glowing review and we could not have enjoyed our stay more! Location: Fantastic, quiet, safe "neighborhood" location about 20 minutes from the airport, and about half that to downtown, and within about a mile of the University of Washington, which means ample access to local services and dining options. Lots of neighbors & kids out walking, biking and visiting - a nice feeling. We rented a car, but there are electric-boosted Lime Bikes (or similar) everywhere (you need them; it's hilly), and public transportation in this area is fantastic if you want to explore. There is a beautiful outdoor mall within a mile or two with lots of great dining and shopping options...plus two full-serve grocery stores if you need them. The nearby Metropolitan Market was wonderful and we also enjoyed a wonderful meal at Mioposto nearby (so good w almost went back a second time!). Decor: Jennifer's home is absolutely precious! It's a beautiful, old home she has obviously poured a lot of love into restoring...tons of charm! Beautiful, lush landscaping, a lovely private side-yard pathway entrance, charming strung lights and outdoor seating areas in her private, cozy back yard. The loft itself has a private entrance and several short sets of stairs...and the loft itself is beautifully decorated and well-appointed. The beds are quite comfortable and the overhead sky-light in the bedroom was a nice touch. The bathroom is large and appointed with lovely marble tile surrounds and new fixtures. The towels, rugs, linens....all top-notch, super luxe, and beautifully laundered. Oh, and if you're tall...the shower head is wonderfully high...although you'll need to duck when entering the loft ;) Amenities: She left no details unmet! The "kitchenette" has a nice big sink, small refrigerator, microwave, hot plate, small cock-pot, full sized coffee maker, electric kettle, and enough basic dishes and glasses to make a lovely, simple meal. We bought fresh juice, fruit, and quiche at the Metropolitan Market, so breakfast was a snap! Jennifer even provided us with a fresh pound of delish coffee, some fruit, granola bars, sugar, and fresh heavy (Website hidden by Airbnb) even if you arrive late at night (as we did) you're set for coffee in the morning! She had a few fans stashed so we could cool down on an unexpectedly warm summer days, and the bedroom has an efficient window A/C that cools it down nicely if you need. She had hangers, an assortment of full-sized shampoos, conditioners, body washes, an iron/ironing board and a small hairdryer in case you need any of these...and she also had a few re-usable canvas bags available to borrow in case you needed some for shopping! This loft, the host and her pup (Jamie...who loves to play but never made a peep the entire time) are WONDERFUL! We'll certainly be back! Review (score: 0.9996): Where to start? This is the second time my husband and I have stayed here, but it most certainly is not just the second time we've tried! We come to Seattle often, but have been unlucky in scoring SkyCabin again until just recently. The neighborhood (Eastlake) is wonderful, with good walking opportunities, and nice little restaurants and coffee shops nearby. There's a great little market/wine shop that requires a picturesque little walk to get to. One day my husband and I walked around the whole of Lake Union, and Gillian offered up some great tips. We'd take that walk again in a heartbeat. Now, the place itself! The view from SkyCabin of Lake Union is extraordinary, as it's unimpeded, and well, it's just a gorgeous view! The living room, kitchen and yoga/office have the same view overlooking the lake, with large and beautiful windows. The yoga/office is off a deck, by the way. Man, we loved that deck; I just wish it had a BBQ. I love the big, interesting and colorful rug in the living room, the large statement lamp, the cool and comfortable couch, and all the fun and thoughtful touches throughout. The kitchen is quite tiny, but oh-so-charming. It's incorporated into the living area, so those magnificent views are right there to be had. The bedroom has the same view if the doors to the living room are open. The bedroom is cozy and a little darker, but it has nice windows to open up for a cross-breeze. Off the bedroom is a bathroom with the best shower/tub combo (all beautifully tiled) with multiple heads. And heated floors! While that wasn't something we had to have in the summer, I can imagine how delicious that would be on a cold and rainy day. Something else that I loved about the place was the lighting. There were lovely lamps all around, and dimmers on overhead lighting. This creates a warm and cozy environment, and makes it so much more visually interesting and appealing. And the light-filtering shades that can be pulled down on all the lakeside windows are a great idea, as it can get warm up high in that neat old house. There is no A/C, but Gillian provided a great fan, and it cooled down nicely at night. Skycabin feels like what its name implies--that you're in a cozy little place up high. The warm birch veneers and skylights help make it feel this way, as well. And as for Gillian and Tony--they're so great! They're friendly and informative, easy-going, and quick to respond. We so enjoyed our interactions with them. Skycabin is a place we hope to return to again and again. Review (score: 0.9997): Awesome location about a mile from downtown Ballard with tons of cute places to visit. I did brewery and distillery tours and lots of gift shopping for the holidays. Lara also lent me a great bike that got me super easily over to the Fremont area as well, where there is more of the same- great restaurants etc. The bike trail (Burke-Gilman) is just at the bottom of the hill from the house which is in a very nice quiet neighborhood. I did not ride super far on the trail, but apparently it is quite long and will take you to tons of great Seattle spots! The room itself is more than a room- has its own bathroom, kitchenette and everything I would need if I wanted a totally private stay. Pretty much the entire downstairs to the whole house is all yours. Very clean and comfortable with a fun tiki theme. It was raining when I was up there and somewhat cold, but I was plenty warm and cozy in the space and Lara left out extra blankets just in case. As it was wet outside, I did not utilize the patio, but nice area with a little fire pit and plenty of seating that would be great for use during better weather. Would be an awesome reading spot with all of the passion fruit vines in bloom! More on the host: As I mentioned before, I could have had a completely private and solitary stay if desired, but Lara was super helpful and sweet. When getting all the bike stuff from her she was also great about giving directions and suggestions which were all perfect for what I was looking for. We actually walked to a little happy hour spot for some evening snacks and had a great visit. Awesome lady who has travelled a lot, as have I, with tons of great stories. Very friendly and genuine, but also perfectly respectful of "my" space downstairs. Overall: would highly recommend this place! Amazing location, super digs that are clean and comfortable and private as well as my favorite airbnb hosts so far! And of course the icing on the cake: the bicycle borrowing availability!...and the easy access to trails and local spots via bike- or walking for that matter- major bonus in my travel preferences. Would definitely stay at Lara's again and will most likely try to book again soon when I'm up there checking out grad school at UW! Oh yeah- forgot to mention, she has the cutest little puppy! A bit shy at first, but once she warms up, she is the biggest sweetheart with the funkiest hair do ever;) Love love love!
# explore the worst reviews
worst_comments = reviews_df[['comments', 'polarity_score']].sort_values(by='polarity_score').head(5)
for index, row in worst_comments.iterrows():
print("Review (score: {}): {} \n".format(row['polarity_score'], row['comments']))
Review (score: -0.996): Renting an AirBnB was supposed to be both cheaper and more comfortable than checking into a hotel. It was neither. After spending a little over $3,000 on checking in, and registering my pets as guests as well, I was asked for another $750 as a pet deposit. This made sense, since I would expect the need for a deep cleaning. The woman who met us at the condo was very nice, though I suppose we should have asked her to take pictures of all of the problems we found on check in so we could clearly show it was not our doing, but we did not expect to be charged any more. The area itself is very nice, though the fact that there was a pool and gym did not mean anything to us since we stayed during the COVID-19 lockdown. The bed is not comfortable, nor are the pillows. The bed is thin and has no box spring, meaning you can feel the slats underneath it. It was terrible to sleep on. So terrible that we purchased inflatable mattresses a week into our stay to avoid sleeping on it any longer. The pillows were either likewise shapeless and thin or too thick and long. We also purchased our own pillows to use. The couch is a cheapish but nice looking - but entirely uncomfortable - convertable futon bed. I would not recommend anyone sleep on this bed, let alone use it to sit on since sitting on it for long periods makes your back hurt. I myself am partially disabled and sleeping here has set me back weeks in productivity and range of movement. There was one other funky looking blue chair that again, was very nice looking, but uncomfortable. Nothing in this place is comfortable, at all. That would be fine for a few days or maybe a week, but a month here was awful. The rug had a stain on it that we were told was marked when we checked in, but according to a complaint from the owner we are expected to pay for a new rug for a stain we did not leave. There were scuffs, paint chips, a broken fridge door and other very 'lived in' things that we did not find to be a deal breaker, but certainly did not expect to be blamed for. The bath mat is slippery and shoddy, we slipped on it repeatedly but did not remove it from the tub because no one had cleaned underneath it previously and it was very gross looking. The internet cord was frayed when we arrived and went out frequently - until halfway through the trip when it stopped working entirely. We were told that it would be fixed in the last week we were there, and had a time set for someone to come, but no one did and no one called us to follow up. It was so close to check out we just said forget it. This has soured our interested in AirBnB as a whole, especially since the owner expects another almost $700 from us for damages we did not do. They accused us of stealing multiple mugs. There were two, both cracked. One broke, and we did admittedly forget to replace it. We figured that since they were obviously cheap coffee mugs to begin with it would not matter, but are being charged $26 for however many the owner decided to say we took, which is a lie. Do not stay here for extended periods, and make sure to be firm about getting pictures for when you check in to avoid being blamed for things you did not do. Review (score: -0.9958): TERRIBLE!!! It's dirty, smelly, and in absolutely horrendous condition!! The place is the biggest dump I've ever seen in my life!!! It is beyond dirty, it smells like the bottom of a garbage bin, and it looks and feels like a meeting place for pimps and prostitution. More importantly, the owner is the most rude, unprofessional, unfriendly, inconsiderate and arrogant Air B&B owner I've ever encountered in my past Air B&B history. I had an issue that required his attention, and no matter how many times I emailed him and asked him to call me, over and over again, explaining that I have a real issue that needs his attention ASAP, he completely ignored me and refused to as much as reply to me by email or by phone in order to assist me with this issue. I can't even believe anyone would stay at a dog house like this, but even more so, when the owner is so incredibly rude, careless, inconsiderate and impolite!! This is the first time I ever write a negative review about an Air B&B, but I had to let everyone know how terrible my experience was and how dishonest and rude this guy is. BAD PLACE!!! STAY AWAY!!!!! Review (score: -0.9944): I highly encourage you to read this entirely for the sake of your well being. The apartment smelled musty and sneezing and wheezing ensued (as I am asthmatic). Extra pillows had weird gross brown and yellow stains and not enough pillow cases. When we retrieved a blanket from the closet there was gross stuff on it too. The sheets were dark brown so who knows if they were clean. The mattress in the master bedroom was old. You could tell by even looking at it. The pool and hot tub were filthy and covered in dirt. The hot tub in the other building was broke. The halls reeked of trash. Pictures were taken of these issues. Amenities are sprawled out through multiple towers and getting to them is difficult and confusing. We even encountered others who were wandering around and frustrated saying they hated this place. You have to walk through the parking garage or the public in your swimsuit to get to the pool. They were out of parking passes so we had to pay the regular fee. The advertisement makes it seem upscale when it is far from it. It may have been at one time, but not now. The downtown location is excellent but the location of amenities is terrible. One of the rails on the balcony (of the 22nd floor) was loose. My teenager noticed and then I did as well. Not a safe feeling. Horrible parking garage. Poor signage, hard to find the correct elevator to the coordinating tower, extremely tight to navigate and insulation hanging from the 6’ 5” ceiling. We went to the basement to get to the pool and after discovering it was filthy we went to go back. The door handle was broke off! Picture taken, and no you couldn’t use it. We went to use the stairs and the door was locked! We went back in the elevator and it beeped and got stuck for a moment then finally let us off at the sky bridge. When we got to the lobby the fire dept was there because there was a fire on the tenth floor and the sprinklers went off and they had shut down the elevators. So there we sat in our swim gear for an hour before one of the firemen escorted us up (because the elevators were still not working). When we asked if this happens a lot he said yes they are there a lot. We had quite enough and just wanted to leave as it was entirely disappointing and we felt unsafe, so we collected our things and walked the 22 flights down with our luggage. After returning the keys it took half an hour to get to our car as so many areas lock you out. We couldn’t even exit to the garage at one point without the keys. Not safe at all. It was entirely frustrating and I am appalled at the price for the quality, the layout and the chaos. Even the concierge apologized and described it as a nightmare. Review (score: -0.9941): Staying at Robert’s place was a nightmare. At 8 am on the first morning, we were woken up by very loud hammering and banging occurring literally right outside our window. The room was shaking that we thought this was an earthquake. We opened the curtain and saw a construction worker on the roof 2 feet from our bedroom window. We immediately called Robert, who unhelpfully said he had no control over this and that we could cancel our reservation or stay in his basement. The first thing I did when I arrived was to make my boyfriend check out his dark basement, so it obviously didn’t make me feel comfortable to sleep in there. It also wouldn’t have helped because there was a loud construction machine right in front of where the basement would be. The construction occurred again on the next morning. Robert knew of this construction from his previous guests and made NO mention of these major disturbances in the listing or communication with us prior to our arrival. He made NO effort to contact the workers and work out a different time for them to do work on his roof. The work is done on HIS house, but he left his guests to suffer through the disturbances for two mornings in a row and deal with the situation without any input. His attitude was simply I have no control over this, deal with it on your own or leave. When we filed for a petition for get a partial refund, Robert was very condescending and rude in his responses. He called us “high maintenance prima donnas” and suggested that we were immature for sleeping in past 8 AM on my weekend vacation in Seattle, as his other guests were all “mature” and left early in the morning so they were not disturbed by the construction work. He even flat out lied about other things (that he offered us a refund being one). There were other communication issues and things that he tried to blame us for. We arrived late at night in October, when the outside temperature is in the low 60s. There are no temperature controls or air vents in the room. The only heating source in the room is this arcane water heater, so we set the temperature to low 70s. However, after we raised our concerns about construction, Robert blamed us for violating his house rules by changing the temperature and wanted to charge us for it. He then mentioned there were space heaters, but he never mentioned this to us before or during our stay. How can you expect guests to stay in a room in the 60s without providing any source of heating and expect them to be comfortable? Overall, this was a horrible Airbnb experience for my boyfriend (who has stayed in over 10 Airbnbs) and I. The host was incommunicative, inconsiderate, and insulting. We suggest thinking twice before staying with Robert. Review (score: -0.9937): Die Unterkunft ist wunderbar. Es ist eine eigene Wohnung mit allem was man zum Leben brauch. Die Küche hat alle Sachen die zum Kochen benötigt werden und für morgens ist eine Kaffeemaschine vorhanden. Das Bad ist sehr schön gemacht und die Dusche ist einfach super. Genügend Platz, so dass man auch zu zweit duschen kann. Sollte man Duschgel, Shampoo und Spülung vergessen haben, steht sogar hier was parat und es ist nicht dieses billige Hotelzeugs, sondern wirklich gutes Duschgel und Shampoo. Zu dem Aufenthaltsbereich, wo ein großer Tisch mit Stühlen steht, kommt noch eine Rudermaschine, die zum Sport animiert. Die Betten sind komfortabel und bei Bedarf ist auch ein großer Schrank zu finden. Zu dem befindet sich ein großer Fernseher in dem Appartement, mit einem Netflixzugang. Alles wird abgerundet von einem Willkommenskorb mit Nudeln, Pasta, Schokolade, Müsli, Kaffee (sehr sehr lecker) und zwei Flaschen Wein. Somit ist für die erste Nacht und den ersten Morgen alles da was benötigt wird. Die Lage ist nicht zentral in Downtown, aber zentral für Seattle. So das der Weg Downtown ähnlich weit ist wie zum Lake Washington. Wer das nicht laufen mag (30min) für Downtown oder auch zum Lake Washington, kann hervorragend mit dem Bus fahren. Die Lage ist auch sehr ruhig und das einzige was ab und an zu hören ist, ist die Waschmaschine und die Hunde, wenn diese aufgeregt durch die Wohnung über sich rennen. Allem in allem eine wirkliche wunderbare Unterkunft und definitiv jedem zu empfehlen. Wir würden nochmal herkommen, wenn es uns wieder nach Seattle verschlägt.
We find:
def tokenize(text):
'''Transform raw text data to a bag of words
Args:
text: (str) raw text data
Returns:
tokens: (list) a list of strings after tokenization
'''
# remove any hyperlinks
text = re.sub("((\S+)?(http(s)?)(\S+))|((\S+)?(www)(\S+))", " ", text)
# normalize case and remove punctuation
text = re.sub("[^a-zA-Z]", " ", text.lower())
# tokenize text
tokens = word_tokenize(text)
# lemmatize and remove stop words
tokens = [WordNetLemmatizer().lemmatize(word)
for word in tokens if word not in stopwords.words("english")]
# remove all words with one or two letters since those do not bring useful information
tokens = [word for word in tokens if len(word) > 2]
return tokens
Here, instead of counting the frequency of words in the corpus, we use the term frequency-inverse document frequency (TF-IDF) to rank words. TF-IDF penalizes overused terms, which helps reduce non-informative words.
tfidf = TfidfVectorizer(sublinear_tf=True,
min_df=5,
norm='l2',
encoding='latin-1',
ngram_range=(1, 2),
stop_words='english',
tokenizer=tokenize)
Note that we pass some additional parameters to the TF-IDF class:
Reference: https://cloud.google.com/blog/products/gcp/problem-solving-with-ml-automatic-document-classification
Next, let’s compare the features of accommodations that received positive vs. negative reviews via word clouds.
def plot_wordcloud(comments, output_filename, background_color='white', max_words=100):
"""Generate a word cloud image given a corpus of comments
Args:
comments: (iterable) a corpus of comments
output_filename: (str) the output file name
background_color: (str) background color for the word cloud image
max_words: (int) the maximum number of words.
Returns:
None
"""
features = tfidf.fit_transform(comments).toarray()
df = pd.DataFrame(features.tolist(), columns=tfidf.get_feature_names())
# generate word cloud
wordcloud = WordCloud(background_color=background_color, max_words=max_words, width=1600, height=800) \
.generate_from_frequencies(df.T.sum(axis=1))
# display the generated image
plt.figure(figsize=(12,6))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.savefig(output_filename + '.png', bbox_inches='tight', dpi=150)
# generate a word cloud for the positive comments
pos_comments = reviews_df[['comments','polarity_score']].sort_values(by='polarity_score')['comments'].tail(500)
plot_wordcloud(pos_comments, 'pos_wordcloud')
# generate a word cloud for the negative comments
neg_comments = reviews_df[['comments','polarity_score']].sort_values(by='polarity_score')['comments'].head(500)
# remove non-English comments
neg_comments_en = neg_comments[neg_comments.apply(detect) == 'en']
plot_wordcloud(neg_comments_en, 'neg_wordcloud')
It’s not surprising the top positive terms include: walk, restaurants, food, shop, bus, park, view, safe and quiet.
In contrast, the top negative terms include: dirty, broken, noise, smell, old and small. Besides, it seems the guests often have issues with amenities, such as shower, door, towel, window, toilet and sheet.